Data Import

Importing master data that has been already cleaned.

setwd("/Users/mengxisun/Documents/Fall 2022/Data Science I/p8105-finalproject.github.io/")

data = read_csv(file = "./data/master.csv") %>%
  janitor::clean_names()
## Rows: 61 Columns: 20
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): borough, region
## dbl (18): age_less_20, age_20_24, age_25_29, age_30_34, age_35_39, age_plus_...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

From the master spreadsheet, we selected the columns and rows we will be using to analyze induced abortions by covariate categories.

We first used pivot_longer to reformat the data and used plotly to create interactive plots.

NYC plots

NYC Age

age = 
  data %>%
  select(1:7) %>%
  slice(3:7)

age %>%
  pivot_longer(
    age_less_20:age_plus_40,
    names_to = "age", 
    values_to = "abortion"
  ) %>%
  mutate(age = factor(age, levels = c("age_less_20", "age_20_24", "age_25_29", "age_30_34", "age_35_39", "age_plus_40"))) %>% 
  plot_ly(x = ~age, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%   
  layout(title = 'Abortion Ratios by Age for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))
age %>% filter(age_less_20 == max(age_less_20))
## # A tibble: 1 × 7
##   borough   age_less_20 age_20_24 age_25_29 age_30_34 age_35_39 age_plus_40
##   <chr>           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>       <dbl>
## 1 Manhattan       1965.     1769.      988.      312.      256.         312

From the bar plots, it is evident that those who age less than 20 have the highest abortion ratios across the boroughs in NYC in 2019. These ratios eem to decrease as age categories increase. When comparing between boroughs, Manhattan had the highest abortion ratios in categories age_less_20 to age_25_29. Staten Island had the lowest abortion ratios overall, except for category age_20_24. The highest abortion ratio across all boroughs and age categories was for Manhattan for age_less_20 with 1964.9 induced abortions per 1,000 live births for Manhattan. The lowest abortion ratio across all boroughs and age categories was for Manhattan for age_less_20 with 1964.9 induced abortions per 1,000 live births.

NYC Race

race = 
  data %>%
  select(1,9:12) %>%
  slice(3:7)

race %>%
  pivot_longer(
    nh_white_only_ratio:h_total,
    names_to = "race", 
    values_to = "abortion"
  ) %>%
  mutate(race = factor(race, levels = c("nh_white_only_ratio", "nh_black_only_ratio", "nh_other_ratio", "h_total"))) %>% 
  plot_ly(x = ~race, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%   
  layout(title = 'Abortion Ratios by Race for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))

From the bar plots, it is evident that those who identified as Non-Hispanic Black had the highest abortion ratios across all boroughs in NYC. Among those who identified as Non_Hispanic_Black, ratios were highest in Manhattan with 1,228.3 induced abortions per 1,000 live births. Those who identified as Non-Hispanic White and Non-Hispanic Other had some of the lowest abortion ratios in NYC. The lowest ratio across all boroughs and race groups was for those who identified as Non-Hispanic White only and lived in Brooklyn with a ratio of 88.6 induced abortions per 1,000 live births.

NYC Financial Plans

financial_plan= 
  data %>%
  select(1,15:18) %>%
  slice(3:7)

financial_plan %>%  
  pivot_longer(
    medicaid:not_stated,
    names_to = "financial_plan", 
    values_to = "abortion"
  ) %>%
  mutate(race = factor(financial_plan, levels = c("medicaid", "self_pay", "other_insurance", "not_stated"))) %>% 
  plot_ly(x = ~financial_plan, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%   
  layout(title = 'Abortion Ratios by Financial Plan for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))

From the bar plots, those who were categorized as having other_insurance had the highest abortion ratios across all financial plans in NYC. When calculating abortion ratios for financial plans, we manually calculated these ratios from (induced abortions/ live births)*1000 as they were not provided like in other data sets. A huge flaw in these numbers is that those who have induced abortions and those who give births tend to use different insurance plans. People who give birth are unlikely to use other_insurance in comparison to those who have induced abortions due to the inherent differnces in procedures. Those who used self_pay for induced abortions had a relatively high abortion ratio as well. This would make sense as well given the nature of the procedure and possible restrictions by the health insurance plan to cover abortions.

NY graphs

NY Age

NY Race

NY Financial Plans

NY vs NYC graphs

NY vs NYC Age

age_total = 
  data %>%
  select(1:7) %>%
  slice(1:2)

age_total %>%
  pivot_longer(
    age_less_20:age_plus_40,
    names_to = "age", 
    values_to = "abortion"
  ) %>%
  mutate(age = factor(age, levels = c("age_less_20", "age_20_24", "age_25_29", "age_30_34", "age_35_39", "age_plus_40"))) %>% 
  plot_ly(x = ~age, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%   
  layout(title = 'Abortion Ratios by Age for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))

From the bar plots, it is evident that those in New York City have a higher abortion ratio compared to those in New York State across all age categories. Generally, these ratios seem to decrease as age categories increase.Ratios were highest among those in the age_less_20 category for both New York City and New York State.

NY vs. NYC Race

race_total = 
  data %>%
  select(1,9:12) %>%
  slice(1:2)

race_total %>%
  pivot_longer(
    nh_white_only_ratio:h_total,
    names_to = "race", 
    values_to = "abortion"
  ) %>%
  mutate(race = factor(race, levels = c("nh_white_only_ratio", "nh_black_only_ratio", "nh_other_ratio", "h_total"))) %>% 
  plot_ly(x = ~race, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%   
  layout(title = 'Abortion Ratios by Race for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))

From the bar plots, it is evident that those who identified as Non-Hispanic Black had the highest abortion ratios across all boroughs in NYC. Among those who identified as Non_Hispanic_Black, ratios were highest in Manhattan with 1,228.3 induced abortions per 1,000 live births. Those who identified as Non-Hispanic White and Non-Hispanic Other had some of the lowest abortion ratios in NYC. The lowest ratio across all boroughs and race groups was for those who identified as Non-Hispanic White only and lived in Brooklyn with a ratio of 88.6 induced abortions per 1,000 live births.

NY vs. NYC Financial Plans

financial_plan_total= 
  data %>%
  select(1,15:18) %>%
  slice(1:2)

financial_plan_total %>%  
  pivot_longer(
    medicaid:not_stated,
    names_to = "financial_plan", 
    values_to = "abortion"
  ) %>%
  mutate(race = factor(financial_plan, levels = c("medicaid", "self_pay", "other_insurance", "not_stated"))) %>% 
  plot_ly(x = ~financial_plan, y = ~abortion, color = ~borough, type = "bar", colors = "viridis") %>%   
  layout(title = 'Abortion Ratios by Financial Plan for Boroughs in New York City', yaxis = list(title = 'Number of Induced Abortions per 1,000 Live Births'))